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ABSTRACT 

The Mouse Genome Database (MGD) (http://www. 
informatics.jax.org) is the community model 
organism database resource for the laboratory 
mouse, a premier animal model for the study of 
genetic and genomic systems relevant to human 
biology and disease. MGD maintains a comprehen- 
sive catalog of genes, functional RNAs and other 
genome features as well as heritable phenotypes 
and quantitative trait loci. The genome feature 
catalog is generated by the integration of computa- 
tional and manual genome annotations generated 
by NCBI, Ensembl and Vega/HAVANA. MGD 
curates and maintains the comprehensive listing of 
functional annotations for mouse genes using the 
Gene Ontology, and MGD curates and integrates 
comprehensive phenotype annotations including 
associations of mouse models with human 
diseases. Recent improvements include integration 
of the latest mouse genome build (GRCm38), 
improved access to comparative and functional an- 
notations for mouse genes with expanded represen- 
tation of comparative vertebrate genomes and new 
loads of phenotype data from high-throughput 
phenotyping projects. All MGD resources are freely 
available to the research community. 

INTRODUCTION 

The Mouse Genome Database (MGD) is the community 
model organism database resource for the laboratory 
mouse, a premier animal model for the study of genetic 
and genomic systems relevant to human biology and 
disease. Initially designed and implemented in 1994 to 
track genetic mapping data and to report on and 
describe mouse mutant phenotypes, MGD has grown to 
be the recognized authority for knowledge about mouse 



genes and as a comprehensive data integration site and 
repository for mouse genetic, genomic and phenotypic 
data derived from primary literature as well as from 
major data providers (1,2). 

The central mission of the MGD is to support the trans- 
lation of information from experimental mouse models to 
uncover the genetic basis of human diseases. As a highly 
curated and comprehensive model organism database, 
MGD provides web and programmatic access to a 
complete catalog of mouse genes and genome features 
including genomic sequence and variant information. 
MGD curates and maintains the comprehensive listing 
of functional annotations for mouse genes using Gene 
Ontology (GO) terms and contributes to the development 
of the GO content and structure (3). Finally, MGD 
curates and integrates comprehensive phenotype annota- 
tions wherein phenotypes are associated with genotypes 
using terms from the Mammalian Phenotype Ontology 
(4) and are represented with precise associations to 
relevant human diseases. These workflows enable 
detailed descriptions of the relevance and relationship of 
mouse models to human diseases (5). 

MGD is a core component of an extensive set of 
genome informatics resources that collectively comprise 
the Mouse Genome Informatics (MGI) resource (http:// 
www.informatics.jax.org). The MGI system includes the 
Gene Expression Database (6), the Mouse Tumor 
Biology Database (7) and the MouseCyc database of bio- 
chemical pathways (8), and provides the authoritative set 
of GO annotations for the laboratory mouse as a founding 
member of the GO Consortium (9). The MGI system 
overall provides an intensively integrated and accessible 
data resource representing the highest quality and most 
comprehensive consensus and experimental views of la- 
boratory mouse as an experimental organism. 

IMPROVEMENTS 

Recent improvements to MGD include integration of the 
latest mouse genome build (GRCm38), improved access to 
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comparative and functional annotations for mouse genes 
and new uploads of phenotype data from the Sanger 
Institute Mouse Genetics Program (10) and the 
Europhenome (EuPh) Database (11). Major improve- 
ments have been made in access to strains and genes 
associated with Cre-recombinase constructs (used for 
cell- and tissue-specific expression) through new imple- 
mentation of the CrePortal. A summary of the database 
content for MGD (September 2013) is given in Table 1. 

Genome feature updates 

MGD maintains a comprehensive catalog of genes, func- 
tional RNAs and other genome features as well as herit- 
able phenotypes and quantitative trait loci (QTL). As 
described previously (12), the MGD genome feature 
catalog is generated by the integration of computational 
and manual genome annotations generated by NCBI, 
Ensembl and Vega/HAVANA. The genome coordinates 
for these features, and phenotypes were updated to the 
latest mouse genome assembly (GRCm38.pl). In 
addition to the standard web-based access to this 
catalog, the unified genome catalog is available via ftp in 
Generic Feature format (GFF) (ftp://ftp.informatics.jax. 
org/pub/mgigff/) to support the use of these data in bio- 
informatics applications. 

Single nucleotide polymorphism (SNP) data in MGD 
were updated to NCBI dbSNP Build 137. With this 
update, all of the SNP and structural variation data 
from deep sequencing of 17 inbred strains of mice (13) 
are now integrated into MGD. 

New implementation and visualization of comparative 
genomic data 

Incorporating external ortholog sets 
When the MGD was first created, it curated the mouse to 
human orthology set using sequence analysis tools and 
reporting sequence-based orthology as described in scien- 
tific publications. Orthology assertions form the basis for 



Table 1. Summary of MGD content September 2013: stats from 
September 8 MGI public stats page 



Current Stats 


September 




2013 


Number of genes with protein sequence data 


24 526 


Number of mouse genes with human orthologs 


17 092 


Number of mouse genes with rat orthologs 


17811 


Number of genes with GO annotations 


25 495 


Total number of GO annotations 


257 164 


Number of mutant alleles (cell lines only) 


712925 


Targeted mutations 


50 569 


Number of mutant alleles in mice 


34 538 


Number of QTL 


4714 


Number of genotypes with phenotype 


48 862 


annotation (MP) 




Total number of MP annotations 


254 327 


Number of mouse models associated to 


4130 


human diseases 




Number of human diseases with one or 


1256 


more mouse models 




Number of references in the MGD bibliography 


193 943 



functional predications that exploit comparative relation- 
ships to infer function for mouse genes from experi- 
mentally determined knowledge in other organisms, 
particularly from experimental knowledge determined 
about human and rat genes. Although MGD has been 
incorporating orthology data from NCBI HomoloGene 
resource (14) for some years, these data were still repre- 
sented within the context of MGD homology assertions 
and specifically restricted the relationships to a 1:1 asser- 
tion of orthology among mammals. In 2013, MGI imple- 
mented a many-to-many orthology paradigm to better 
reflect current understanding about the relationships 
between genes of these organisms. Although one-to-one 
orthology assertions between mouse/human/rat genes 
still hold for 98% of protein-coding genes, MGI can 
now more clearly represent cases such as Serpinala 
(MGP891971), where phylogenetic analysis shows five 
mouse genes and one human gene in the same orthology 
class (Figure 1). In addition, MGD is now importing 
orthology data from the HomoloGene, although revisions 
in the MGD schema support importation of orthology 
sets from any external resource. 

Extending representation beyond mammals to include 
other vertebrate species 

With the revision of the homology data, we extended the 
comparative data coverage from 'mammalian' to 'verte- 
brate' inclusion in MGD. The orthology data views, there- 
fore, now include information from chicken and zebrafish 
genomes. As part of the updating process, we also now 
represent new graph views of comparative GO annota- 
tions for experimentally determined data from human, 
mouse and rat (Figure 2). 

Improvements in GO annotation completeness 
and visualization 

MGD, a founding member of the GO Consortium, 
provides the comprehensive set of mouse functional anno- 
tations using the GO for mouse genes and gene products. 
MGD curators contribute to the development of the GO 
ontologies and participate in a variety of GO Consortium 
working groups including the PAINT phylogenetic 
analysis for functional annotation project (15). MGD ex- 
pertise in curation of the biomedical literature provides 
the core experimental data used to infer function for 
orthologous genes in a broad comparative genomics 
context. 

Following the revision of the MGD representation of 
vertebrate orthology, the GO team at MGD implemented 
new rules for loading of Inferred from Sequence 
Orthology annotations from other vertebrate species. 
These annotations are only generated when the 
contributing annotation is derived from experimental 
results in the specific organism [e.g. experimental data 
from human gene BMP4 (UniProt record PI 2644) 
provides data through PMID 7811286 to assert through 
evidence code Inferred from Sequence Orthology that 
mouse Bmp4 has inferred molecular function 'BMP 
receptor binding' (GO:0070700)]. 
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? Vertebrate Homology Class 


Source HomoloGene (Release 67, Dec 12, 2012) 
Class ID 20103 s 


Comparative GO Graph HomoloGene:20103 Multiple Sequence Alianment [MGI Homology Information! 
(mouse, human, rat) 


Species 


Symbol 


Gene Links 


Genetic 
Location 


Genome Coordinates 

(mouse and human only) 


Associated Human 
Diseases 


Sequences 


select all | deselect all ( get FASTA s ]Go 


human 


SERPINA1 


HGNC:8941 (HGNC) 
5265 (Entrez Gene) 
107400 (OMIM) 


Chrl4q32.1 


Chrl4:94843084-94857029 (-) 
GRCh37.plO 


Alpha-l-Antitrypsin Deficiency 


; i P01009 (UniProt | EBI) 
i NM 000295 (RefSeq) 


mouse 


Serpinala 


MGI:891971 (MGI) 
20700 (Entrez Gene) 
Gene Tree 
VISTA-Point 


Chrl2 52.98 CM 


Chrl2: 103853589-103863562 (-) 
GRCm38 




OTTMUSG00000023942 (VEGA) 
: 1 OTTMUSP00000028351 (VEGA) 
□ OTTMUST00000058509 (VEGA) 


Serpinalb 


MGI:891970 (MGI) 
20701 fEntrez Gene) 
Gene Tree 
VISTA-Point 


Chrl2 52.98 CM 


Chrl2: 103728156-103738158 (-) 
GRCm38 




i : OTTMUSG00000035410 (VEGA) 
OTTMUSP00000049718 (VEGA) 
□ OTTMUST00000090560 (VEGA) 




Serpinalc 


MGI: 89 1969 (MGI) 
20702 fEntrez Gene) 
Gene Tree 
VISTA-Point 


Chrl2 52.98 CM 


Chrl2: 103894926-103904887 (-) 
GRCm38 




OTTMUSG00000023935 (VEGA) 
1 i OTTMUSP00000028336 (VEGA) 
; OTTMUST00000058482 (VEGA) 


Serpinald 


MGI:891968 (MGI) 
20703 (Entrez Gene) 

UCl IC II cc 

VISTA-Point 


Chrl2 52.98 CM 


Chrl2: 103763594-103773592 (-) 
GRCm38 




i OTTMUSG00000027640 (VEGA) 
OTTMUSP00000034565 (VEGA) 
dttmi iSTnnnnnnfi8?R5 (\ifc,a\ 


Serpinale 


MGI:891967 (MGI) 
20704 (Entrez Gene) 
Gene Tree 

VIS 1 A-rOint 


Chrl2 52.98 CM 


Chrl2: 103946931-103958975 (-) 
GRCm38 




OTTMUSG00000027635 (VEGA) 
1 OTTMUSP00000034559 (VEGA) 
□ OTTMUST00000068380 (VEGA) 


rat 


Serpinal 


24648 fEntrez Gene) 


Chr6 q32 






i i Pl/4/b (UniProt | bBl) 
NM 022519 (RefSeq) 


chimpanzee 


SbKrINAl 


AC^CAA /Tn^i-^ 

40/541 (Entrez bene) 


Chrl4 






Xr_ozz9Jo (Ketbeq) 


rhesus 
macaque 


SERPINA1 


701361 (Entrez Gene) 


Chr7 






n NP 001252946 (RefSeq) 
NM 001266017 (RefSeq) 


cattle 


SERPINA1 


280699 (Entrez Gene) 


Chr21 






i i P34955 (UniProt | EBI) 
n NM 173882 (RefSeq) 


dog 


SERPINA1 


480422 (Entrez Gene) 


Chr8 






NP 001073578 (RefSeq) 
NM 001080109 (RefSeq) 


chicken 


SERPINA1 


423434 (Entrez Gene) 


Chr5 






1 . NP 001264422 (RefSeq) 
NM 001277493 (RefSeq) 


SERPINA3 


772339 (Entrez Gene) 


Chr5 






XP 004941983 (RefSeq) 


SERPINA4 


423433 (Entrez Gene) 


Chr5 






1 I NP 001264421 (RefSeq) 
i NM 001277492 ( iefSea) 


SERPINA5 


423435 (Entrez Gene) 


Chr5 






: XP 421344 (RefSeq) 


SERPINA9 


423436 (Entrez Gene) 


Chr5 






□ XP 421345 (RefSeq) 


zebrafish 


serpinal 


322701 (Entrez Gene) 


Chr20 






i : NP 001071226 (RefSeq) 
NM 001077758 (RefSeq) 


serpinall 


321195 (Entrez Gene) 


Chr20 






NP 001013277 (RefSeq) 
NM 001013259 (RefSeq) 



Figure 1. Homology Detail Page: Complete orthology set representation in MGD, derived from HomoloGene cluster 20103. 



Major revisions in the CrePortal 

Studies of cell-type and stage-specific gene regulation and 
function often use conditional mutagenesis, in which genes 
can be knocked out at specific sites in a spatial and 
temporal manner. To effectively use conditional mutagen- 
esis, mice carrying an appropriate recombinase (e.g. Cre) 
construct are required for mating to mice bearing condi- 
tional-ready loxP-flanked genes. 

The CrePortal (http://www.creportal.org) provides 
critical data about Cre constructs, including the driver, 
whether recombinase activity is inducible (and by what), 
strain availability through public repositories and publica- 
tions describing conditional mutagenesis done using each 
Cre allele. Histological images, annotated with activity 
patterns, anatomical structures and ages, assayed defining 



Cre specificity can assist selection of optimal Cre-bearing 
strains for specific experiments. Whole slide viewing is avail- 
able for some Cre lines, notably submitted by JAX Mice 
and from the Allen Institute of Brain Science. Access to Cre 
specificity data is critical in determining the best Cre-bearing 
strains for experiments, not only for knowledge about 
activity at the desired target (and its time/space distribu- 
tion), but also for considering 'off-target' activity that may 
complicate interpretation of observed pheno types. Through 
links to MGI (http://www.informatics.jax.org), phenotypic 
information for conditional genotypes that have been 
studied is also provided. As of September 2013, there were 
>2000 unique Cre alleles cataloged in the CrePortal. 

Important new features have been implemented in the 
CrePortal in response to user comments (Figure 3). These 
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Comparative GO Graph (mouse, human, rat) 



Source 
Class ID 



HomoloGene (Release 67, Dec 12, 2012) 

7247 & 



Graphs for Molecular Function , Cellular Component , Biological Process generated Sat Apr 20 09:28: 16 2013. 

Complete table of the annotations represented in this image is provided below. 

Graphs display curated GO classifications for mouse, human and rat homologs annotated from 

the biomedical literature. 

• This class contains 

° Human genes BMP4 
o Mouse genes Bmp4 
o Rat genes Bmp4 

• Annotations are indicated with nodes colored by organism: ^^^H, Mouse, Rat 

• Relations between GO terms are indicated by colored edges "is_a"; "part_of"; "regulates"; 

"positively_regulates";"negatively_regulates" 

• GO annotations: Human from GOA ; Mouse from MGI ; Rat from RGD . 

• Only experimental annotations are displayed: EXP, IDA, IGI, IMP, IPI. Evidence codes are listed at the 
bottom of the page. 

Molecular Function 



Molecular Function 
HomoloGene:7247 



| binding"| 



| chemoattractanl 
activity 



protein binding 


Human 


BMP4 


Mouse 


Brr p4 



| receptor binding | 



identical protein 
binding 



protein dimerization 
activity 



receptor serine/threonine 
kinase binding 



growth factor 
activity 



|Bmp4 



cytokine activity 



protein homodimenzatior 
activity 




Bmp4 



transmembrane 
receptor protein 
serine/threonine 
kinase binding 



BMP receptor 

binding 

Human | BMP4 



Figure 2. GO Comparative Graphs: Experimentally derived GO annotations available for human, rat and mouse for the Molecular Function domain. 
In the mouse annotation file, these human and rat annotations form the basis for Inferred from Sequence Orthology annotations for the mouse. 



include (i) the ability to search for Cre activity by specific 
tissue or structure such as 'left ventricle cardiac muscle' 
(formerly only searches by anatomical system were avail- 
able, e.g. cardiovascular system); (ii) a summary matrix of 
Cre activity in structures/tissues assayed versus age, e.g. for 
left ventricle cardiac muscle, one can visualize its activity by 
age distribution; also note off-target embryonic expression 
in liver and pharynx; (iii) a new 'Your Observations 
Welcome' link for contributions of laboratory experience 
with particular Cre mouse strains, as many 'facts' about 
laboratory performance of Cre lines remain anecdotal; 
and (iv) a submission form for data and image files on 
new Cre strains, or additional data on existing strains. 



Integration of high-throughput phenotype data 

MGI now includes high-throughput phenotyping data 
along with data submitted from laboratories and centers, 
and curated data from publications, providing compre- 
hensive comparative phenotypes for mouse mutants. 
Current high-throughput data sets include those from 
the Wellcome Trust Sanger Institute (WTSI) (10) and 
the EuPh (11) database. This integration allows specific 
comparisons between different centers' data interpret- 
ations and is a prelude to future MGI data integration 
from the International Mouse Phenotyping Consortium 
project sites and Data Coordination Center (16). 
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(a) 




Recombinase (ere) Activity 

MGI collects and annotates expression and activity data for 
recombinase-containing transgenes and knock-in alleles. 



Access Data 

Find recombinase-carrying alleles 

Search for alleles assayed for specificity/activity in an anatomicalstructure. 



FAQs 

How do 



Recombinase 



left ventricle cardiac muscle 



Go 



.. find exi 
recombin; 
transgene 
that have 
(driver)? 



(b) 



? 




Recombinase Alleles 


- Tissue Summary 






You searched for: 

Activity assayed in left ventricle cardiac muscle includes synonyms & substructures 

Click column headings to sort table data. 


= first < prev 1 next> last » 

Showing items 1 


Driver ▼ 


Allele Symbol * 
Gene; Allele Name 


Recombinase Activity A 
Detected 


Recombinase Activity * 
Not Detected 


Allele Synonym 




Myh6 


Tg(Myh6-cre)2182Mds 
transgene insertion 2182, 
Michael DSchneider 


alimentary system, cardiovascular 
system, hemolymphoid system, 
muscle, renal and urinary system, 
reproductive system, 
respiratorysystem 


limbs, liver and biliary system 


MCH-cre, MHC-Cre, 
MHCalphaCre, MHCcre, 
Tg(Myhca-cre)*, 2182Mds, 
alpha-MHC-CrealphaMhc- 
Cre, alphaMyHC-Cre 




Nkx2-5 


Tg(Nkx2-5-cre)9Eno 

transgenl insertion 9, Eric N Olson 


alimentary system, cardiovascular 
system, liver and biliary system 




Nk9, Nkx2.5::Cre 





(c) 



Tg(Nkx2-5-cre)9Eno 

Transgene Detail 



Your Input Welcome 



Nomenclature | Transgene origin | Transgene description | Recombinase activity | Phenotypes | Find Mice (IMSR) | References 



Nomenclature 



Transgene 
origin 



Transgene 
description 



Recombinase 
activity 



Symbol: Tg(Nkx2-5-cre)9Eno 

Name: transgene insertion 9, Eric N Olson 
MGI ID: MGI:3514028 
Synonyms:! Nk9, Nkx2.5::Cre 

Transgene: Tg(Nkx2-5-cre)9Eno/.ocat/on: unknown 



Show the 1 image(s) involving this allele. 



Strain of Origin: Not Specified 



Transgene Type: Transgenic (Cre/Flp) 
Mutation: Insertion 

^ Mutation details 



Activity: ▼ 



Activity in Systems/Structures 

show or hide all structures 
V Activity Detected - Activity Mot Detected 


E 0-8.9 


E 9.0-13.9 


E 14-19.5 


PO-21 


Post-weaning 

P 22-42 


Adult 

>P43 


Images 


alimentary system 


▼ 




V 












pharynx 




V 












cardiovascular system 


T 


V 












V 


early primitive heart tube 




V 












V 


endocardial tube 




V 












V 


heart 




V 












V 


left ventricle cardiac muscle 
















V 


primitive ventricle cardiac muscle 




V 


V 










V 


right ventricle cardiac muscle 






V 










V 


liver and biliary system 

liver 


T 




V 













Nkx2-5 Summary of a 



nbinase alleles driven by Nkx2-5. 



Your Observations Welcome 



Figure 3. New features in the CrePortal: (a) Cre Search Form accessed at www.creportal.org or by choosing the Recombinase (Cre) icon box at 
www.informatics.jax.org. The term 'left ventricle cardiac muscle' has been entered in the search box. This field has an auto-complete feature that 
indicates both annotated terms (black type) and terms with no annotation (gray type), (b) Results from the above search for 'left ventricle cardiac 
muscle'. Two Cre transgenes that show specific Cre activity in this tissue are returned. Formerly only systems searches were allowed; a search by 
'cardiovascular system' returns 266 transgenes and knock-ins, most of which are not specific for left ventricle cardiac muscle (searches done 29 
September 2013). (c) Overview of data for the Cre transgene, Tg(Nkx2-5-Cre)9Eno clicking through from the symbol on the Results page in (b). The 
recombinase activity matrix shows activity for this Cre transgene in structures/tissues assayed versus age. Note that anatomical systems have been 
toggled open to show specific tissues. For left ventricle cardiac muscle, one can visualize its activity by age distribution, and also note off-target 
embryonic expression of this Cre transgene in pharynx and liver. 
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Importantly, this integration allows comparisons of 
knockout mouse phenotypes called from high-through- 
put phenotyping pipelines versus calls of the same 
mouse data by other analysis groups, or the same 
mouse knockouts phenotypically assessed by research 
groups specializing in particular phenotype assessments. 
In addition, MGI's integrated phenotypic data allow 
comparison of these knockouts with other mutagenized 
alleles of a gene (e.g. ENU mutants, other genetically 
engineered constructs). Such comparison enables new 
hypotheses for gene function based on phenotype anno- 
tations for alleles of a gene and permits comparisons of 
phenotypic range when systematic phenotypic testing on 
defined genetic background is carried out. Figure 4 
shows a portion of the MGD detail page for the 
knockout allele, S ytll tmIa(KOMP)Wisl . In the phenotypic 



table section, one can observe data from WTSI and the 
EuPh database. The specific annotations for the hem- 
atopoietic system are expanded to highlight the clear 
differences between phenotype calls made by the two 
groups using the same data. This difference in data 
interpretation is particularly relevant, as high-through- 
put data are generated, analyzed and integrated in 
different ways. MGD is committed to displaying these 
differences to inform researchers when seeking mice 
with particular characteristics, and as a differentiation 
between results from high-throughput pipelines using 
specific algorithms and statistical cut-offs in calling 
phenotypes versus traditional wet-bench or 'low- 
throughput' laboratories where more granular pheno- 
types and system studies produce focused sets of 
phenotypic analyses. 



? 



Syt | 1 tm1a(KOMP)Wtsi 

Targeted Allele Detail 



Your Input Welcome 



Nomenclature | Mutation origin | Mutation description | Phenotypes | Find Mice (IMSR) | References 



Nomenclature 



Symbol: S ytll tmla < KOMP > wtsi 
Name: synaptotagmin-like 1; targeted mutation la, Wellcome Trust Sanger Institute 

MGI ID: MGI:4363391 

Gene: Sytll Location: Chr4:133253090-133263113 bp, - strand Genetic Position: Chr4, 66.25 cM 



Mutation 
origin 



Mutant Cell Lines: EPD0084_1_B04, EPD0084_1_C03, EPD0084_1_C04, EPD0084_1 

Germline Transmission: Earliest citation of germline transmission: J:175295 
Parent Cell Line: JM8.F6 (ES Cell) 
Strain of Origin: C57BL/6IN 



_D04, EPD0084_1_F02 (Wellcome Trust Sanger Institute) 



Mutation 
description 



Allele Type: Targeted (Floxed/Frt) 
Mutation: Insertion Vector: LlL2_Bact_P 
► Mutation details 



hm 


homozygous ht heterozygous | tq 


involves transgenes 


V 


phenotype observed 


cn 


conditional genotype cx 


complex: > 1 genome feature ot 


other; hemizygous, indeterminate,... 


N 


normal phenotype 



Phenotypes 



Key: 
Genotypes: 

Phenotypes: 



Genotype Allelic Composition 



hml | 

otil 



Sytt ^mlBCKOMP)WtS[ /SyH1 tml«[KOMP)WtBl 
Sytil tmla(KOHP)Wtsy ? 



Genetic Background Cell Line(s) 

Not Specified EPD0084_1_C03 
Not Specified EPD0084_1_C03 



Affected Systems 

^show or hide all annotated terms 



Sex: 
Source: 



behavior/neurological 
growth /size 
hematopoietic system 

decreased erythrocyte cell number 
decreased hematocrit 
decreased mature B cell number 

increased CD4-positive, alpha-beta memory T cell number 
increased regulatory T cell number 
decreased T cell number 

decreased CD4-positive T cell number 
increased mean platelet volume 
homeostasis/ metabolism 

immune system ► 
limbs/digits/tail 
other phenotype 
skeleton 



V V 
©V V 



0,2 



Toggling hematopoietic 
system expands underlying 
annotations clearly showing 
the difference in phenotype 
calls from Wtsi and EuPh. 



View phenotypes for all genotypes (concatenated display). 



Figure 4. High-throughput Phenotype Data: This partial image of the phenotype detail page for the targeted allele Sytlltmla(KOMP)Wtsi illustrates 
the ability to compare high-throughput data results. In the 'Phenotypes' section, one is first presented with a colored key of the specific genotypes 
analyzed. This key corresponds to the columns of the phenotype table. In this example, the data for the 'hml' homozygous females and males have 
been analyzed by both WTSI and EuPh databases. The hematopoietic system phenotype section has been expanded to show the clear differences in 
phenotype calls made on the same animal data sets by these two centers based on their differing analysis methods. 
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COMMUNITY OUTREACH AND USER SUPPORT 

MGD offers extensive resource support through online 
documentation, frequently asked questions, tutorials and 
Email and phone access to User Support staff. 
User Support can be accessed by: 

• World Wide Web: http://www.informatics.jax.org/mgi 
home /homepages /help . shtml 

• Email access: mgi-help@jax.org 

• Telephone access: +1 207 288 6445 

• FAX access: +1 207 288 6132 

The MGD User Support group also maintains and 
moderates two Email bulletin boards, MGI-LIST and 
MGI-TECHNICAL-LIST, for the mouse research com- 
munity (http : / /www. informatics .j ax . org/mgihome /lists/ 
lists. shtml). Updates to the MGI resources are announced 
on the lists, and list subscribers can post questions for 
community discussion. MGI-LIST has > 2000 subscribers 
and an average of 60 posts/discussions per month. MGI- 
TECHNICAL-LIST is a smaller less active list that is 
geared toward computational access to MGI data. 



DATA SUBMISSION 

Most of the data in MGD come from semi-automated 
curation of the peer-reviewed scientific literature and 
from collaborative/cooperative arrangements with large 
mouse-related data centers and repositories and other in- 
formatics resources. MGD also supports electronic data 
contributions directly from individual researchers. Any 
type of data that MGD maintains can be submitted as 
an electronic contribution. Other common types of sub- 
mission include mutant and QTL mapping data. Each 
electronic submission receives a permanent database ac- 
cession ID. All data sets are associated with their source, 
either a publication or an electronic submission reference. 
MGD reference pages provide links to associated data 
sets. Online information about data submission proced- 
ures is found at the following URL: http://www. inform 
atics.jax.org/submit.shtml. 



SYSTEM OVERVIEW 

The MGD database, software and hardware are organized 
into a front end, where the data are made available to the 
public, and a back end, where data are loaded and curated 
from various resources. In the past, the front end and back 
end shared a common database structure/schema, but in 
recent years, the two have been decoupled: the new front 
end is tuned for performance and web display, whereas the 
back end is designed to support data curation and inte- 
gration. The front end database is highly denormalized 
and augmented by Solr/Lucene (http ://lucene. apache, 
org/solr) indexes. During each weekly data release, data 
from the back end are migrated to the front end and Solr/ 
Lucene indexes are populated. In addition to the signifi- 
cant performance improvements enjoyed by the user, the 
decoupling of the front and back ends also helps to limit 
and manage the ripple effects of changing either side. 



The front end-public data access 

MGD provides free public access to its data in a number 
of ways; all are accessible from the main Web site: http:// 
www.informatics.jax.org. The web interface, the software 
that provides the interactive searching and dynamic 
content on the web site, is the most commonly used 
access point. In addition to the simple keyword-based 
'Quick Search' available on every web page, there are a 
variety of forms available for more involved queries 
including searches for Genes and Markers; Phenotypes, 
Alleles and Diseases; SNPs; and References. There are 
also vocabulary browsers for GO, Mammalian 
Phenotype Ontology (MP) and OMIM disease terms 
that support exploration of these vocabularies and 
access to all data in MGD annotated to each vocabulary 
term. Graphical views of the mouse genome and inter- 
active genome browsing are supported by our Generic 
Genome Browser (GBrowse) instance, which was 
recently upgraded to GBrowse 2.X (http://gbrowse.in 
formatics.jax.org/cgi-bin/gb2/gbrowse/mousebuild38/). 

In addition to traditional query forms and data 
displays, MGD offers users several other ways to access 
data. The Batch Query tool (17) (http://www. informatics. 
jax.org/batch) supports bulk access to certain information 
about lists of genes. Users can upload identifiers from a 
wide variety of sources (MGI, Entrez, Ensembl, etc), have 
those IDs matched to genes in MGI and download 
specified information for those genes, e.g. genome coord- 
inates and GO annotations. Results are available in 
HTML, tab delimited text or Excel format. Other parts 
of the web interface exploit this tool, allowing users to 
generate customized gene/feature summaries from query 
results. 

Subsets of MGI data are also available through in- 
stances of BioMart and InterMine. These popular data 
warehousing systems offer interactive web interfaces as 
well as programmable APIs via Web Services. Our 
BioMart instance contains two data sets: mouse genes 
and genome features, and mouse developmental gene ex- 
pression data. Our InterMine instance, called MouseMine 
(http://www.mousemine.org), contains the complete anno- 
tation 'core' of MGI, including the complete catalog of 
mouse genes, alleles and strains, plus annotations to the 
GO, Mammalian Phenotype Ontology and OMIM. 

Finally, MGI offers access to a large set of regularly 
updated database reports via our FTP site (ftp://ftp.in 
formatics.jax.org), and direct SQL access to a read-only 
copy of the database (contact MGI user support for an 
account). MGI User Support can also assist users in 
generating custom reports on request. 

CITING MGD 

This article provides the general citation for use of the 
MGD resource. Please use the following format for 
citation when referencing data sets specific to the MGD 
component of the MGI resource: MGD, MGI, The 
Jackson Laboratory, Bar Harbor, Maine (URL: http:// 
www.informatics.jax.org). [Type in date (month, year) 
when you retrieved the data cited.] 
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